26 research outputs found

    Taming Strings in Dynamic Languages - An Abstract Interpretation-based Static Analysis Approach

    Get PDF
    In the recent years, dynamic languages such as JavaScript, Python or PHP, have found several fields of applications, thanks to the multiple features provided, the agility of deploying software and the seeming facility of learning such languages. In particular, strings play a central role in dynamic languages, as they can be implicitly converted to other type values, used to access object properties or transformed at run-time into executable code. In particular, the possibility to dynamically generate code as strings transformation breaks the typical assumption in static program analysis that the code is an immutable object, indeed static. This happens because program\u2019s essential data structures, such as the control-flow graph and the system of equation associated with the program to analyze, are themselves dynamically mutating objects. In a sentence: "You can\u2019t check the code you don\u2019t see". For all these reasons, dynamic languages still pone a big challenge for static program analysis, making it drastically hard and imprecise. The goal of this thesis is to tackle the problem of statically analyzing dynamic code by treating the code as any other data structure that can be statically analyzed, and by treating the static analyzer as any other function that can be recursively called. Since, in dynamically-generated code, the program code can be encoded as strings and then transformed into executable code, we first define a novel and suitable string abstraction, and the corresponding abstract semantics, able to both keep enough information to analyze string properties, in general, and keep enough information about the possible executable strings that may be converted to code. Such string abstraction will permits us to distill from a string abstract value the executable program expressed by it, allowing us to recursively call the static analyzer on the synthesized program. The final result of this thesis is an important first step towards a sound-by- construction abstract interpreter for real-world dynamic string manipulation languages, analyzing also string-to-code statements, that is the code that standard static analysis "can\u2019t see"

    Analyzing Dynamic Code: A Sound Abstract Interpreter for evil eval

    Get PDF
    Dynamic languages, such as JavaScript, employ string-to-code primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyze, are themselves dynamically mutating objects. Nevertheless, assembling code at run-time by manipulating strings, such as by eval in JavaScript, has been always strongly discouraged since it is often recognized that \u201ceval is evil", leading static analyzers to not consider such statements or ignoring their effects. Unfortunately, the lack of formal approaches to analyze string-to-code statements pose a perfect habitat for malicious code, that is surely evil and do not respect good practice rules, allowing them to hide malicious intents as strings to be converted to code and making static analyses blind to the real malicious aim of the code. Hence, the need to handle string-to-code statements approximating what they can execute, and therefore allowing the analysis to continue (even in presence of dynamically generated program statements) with an acceptable degree of precision, should be clear. In order to reach this goal, we propose a static analysis allowing us to collect string values and to soundly over-approximate and analyze the code potentially executed by a string-to-code statement

    Static Program Analysis for String Manipulation Languages

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a hard challenge for static analysis of these programming languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. Moreover, string obfuscation is very popular in the context of dynamic language malicious code, for example, to hide code information inside strings and then to dynamically transform strings into executable code. In this scenario, more precise string analyses become a necessity. This paper is placed in the context of static string analysis by abstract interpretation and proposes a new semantics for string analysis, placing a first step for handling dynamic languages string features.Comment: In Proceedings VPT 2019, arXiv:1908.0672

    Abstract Domains for Type Juggling

    Get PDF
    Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features that make them both flexible and error-prone. In order to prevent bugs in web applications, there is a sore need for powerful static analysis tools. In this paper, we investigate how Abstract Interpretation may be leveraged to provide a precise value analysis providing rich typing information that can be a useful component for such tools. In particular, we define the formal semantics for a core of PHP that illustrates type juggling, the implicit type conversions typical of PHP, and investigate the design of abstract domains and operations that, while still scalable, are expressive enough to cope with type juggling. We believe that our approach can also be applied to other languages with implicit type conversions

    Static Analysis for ECMAScript String Manipulation Programs

    Get PDF
    In recent years, dynamic languages, such as JavaScript or Python, have been increasingly used in a wide range of fields and applications. Their tricky and misunderstood behaviors pose a great challenge for static analysis of these languages. A key aspect of any dynamic language program is the multiple usage of strings, since they can be implicitly converted to another type value, transformed by string-to-code primitives or used to access an object-property. Unfortunately, string analyses for dynamic languages still lack precision and do not take into account some important string features. In this scenario, more precise string analyses become a necessity. The goal of this paper is to place a first step for precisely handling dynamic language string features. In particular, we propose a new abstract domain approximating strings as finite state automata and an abstract interpretation-based static analysis for the most common string manipulating operations provided by the ECMAScript specification. The proposed analysis comes with a prototype static analyzer implementation for an imperative string manipulating language, allowing us to show and evaluate the improved precision of the proposed analysis

    Static analysis for dummies: experiencing LiSA

    Get PDF
    Semantics-based static analysis requires a significant theoretical background before being able to design and implement a new analysis. Unfortunately, the development of even a toy static analyzer from scratch requires to implement an infrastructure (parser, control flow graphs representation, fixpoint algorithms, etc.) that is too demanding for bachelor and master students in computer science. This approach difficulty can condition the acquisition of skills on software verification which are of major importance for the design of secure systems. In this paper, we show how LiSA (Library for Static Analysis) can play a role in that respect. LiSA implements the basic infrastructure that allows a non-expert user to develop even simple analyses (e.g., dataflow and numerical non-relational domains) focusing only on the design of the appropriate representation of the property of interest and of the sound approximation of the program statements

    Abstract Domains for Type Juggling

    Get PDF
    Web scripting languages, such as PHP and JavaScript, provide a wide range of dynamic features that make them both flexible and error-prone. In order to prevent bugs in web applications, there is a sore need for powerful static analysis tools. In this paper, we investigate how Abstract Interpretation may be leveraged to provide a precise value analysis providing rich typing information that can be a useful component for such tools. In particular, we define the formal semantics for a core of PHP that illustrates type juggling, the implicit type conversions typical of PHP, and investigate the design of abstract domains and operations that, while still scalable, are expressive enough to cope with type juggling. We believe that our approach can also be applied to other languages with implicit type conversions

    Information Flow Analysis for Detecting Non-Determinism in Blockchain

    Get PDF
    A mandatory feature for blockchain software, such as smart contracts and decentralized applications, is determinism. In fact, non-deterministic behaviors do not allow blockchain nodes to reach one common consensual state or a deterministic response, which causes the blockchain to be forked, stopped, or to deny services. While domain-specific languages are deterministic by design, general purpose languages widely used for the development of smart contracts such as Go, provide many sources of non-determinism. However, not all non-deterministic behaviours are critical. In fact, only those that affect the state or the response of the blockchain can cause problems, as other uses (for example, logging) are only observable by the node that executes the application and not by others. Therefore, some frameworks for blockchains, such as Hyperledger Fabric or Cosmos SDK, do not prohibit the use of non-deterministic constructs but leave the programmer the burden of ensuring that the blockchain application is deterministic. In this paper, we present a flow-based approach to detect non-deterministic vulnerabilities which could compromise the blockchain. The analysis is implemented in GoLiSA, a semantics-based static analyzer for Go applications. Our experimental results show that GoLiSA is able to detect all vulnerabilities related to non-determinism on a significant set of applications, with better results than other open-source analyzers for blockchain software written in Go

    Information Flow Analysis for Detecting Non-Determinism in Blockchain (Artifact)

    Get PDF
    A mandatory feature for blockchain software, such as smart contracts and decentralized applications, is determinism. In fact, non-deterministic behaviors do not allow blockchain nodes to reach one common consensual state or a deterministic response, which causes the blockchain to be forked, stopped, or to deny services. While domain-specific languages are deterministic by design, general-purpose languages widely used for the development of smart contracts such as Go, provide many sources of non-determinism. However, not all non-deterministic behaviours are critical. In fact, only those that affect the state or the response of the blockchain can cause problems, as other uses (for example, logging) are only observable by the node that executes the application and not by others. Therefore, some frameworks for blockchains, such as Hyperledger Fabric or Cosmos SDK, do not prohibit the use of non-deterministic constructs but leave the programmer the burden of ensuring that the blockchain application is deterministic. In this paper, we present a flow-based approach to detect non-deterministic vulnerabilities which could compromise the blockchain. The analysis is implemented in GoLiSA, a semantics-based static analyzer for Go applications. Our experimental results show that GoLiSA is able to detect all vulnerabilities related to non-determinism on a significant set of applications, with better results than other open-source analyzers for blockchain software written in Go

    SEA: String Executability Analysis by Abstract Interpretation

    Get PDF
    Dynamic languages often employ reflection primitives to turn dynamically generated text into executable code at run-time. These features make stan- dard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equa- tions associated with the program to analyse, are themselves dynamically mutating objects. We introduce SEA, an abstract interpreter for automatic sound string executability analysis of dynamic languages employing bounded (i.e, finitely nested) reflection and dynamic code generation. Strings are stat- ically approximated in an abstract domain of finite state automata with basic operations implemented as symbolic transducers. SEA combines standard program analysis together with string executability analysis. The analysis of a call to reflection determines a call to the same abstract interpreter over a code which is synthesised directly from the result of the static string exe- cutability analysis at that program point. The use of regular languages for approximating dynamically generated code structures allows SEA to soundly approximate safety properties of self modifying programs yet maintaining ef- ficiency. Soundness here means that the semantics of the code synthesised by the analyser to resolve reflection over-approximates the semantics of the code dynamically built at run-rime by the program at that point
    corecore